Optimal control for a SIR epidemic model with limited quarantine

Social distance, quarantines and total lock-downs are non-pharmaceutical interventions that policymakers have used to mitigate the spread of the COVID-19 virus. However, these measures could be harmful to societies in terms of social and economic costs, and they can be maintained only for a short period of time. Here we investigate the optimal strategies that minimize the impact of an epidemic, by studying the conditions for an optimal control of a Susceptible-Infected-Recovered model with a limitation on the total duration of the quarantine. The control is done by means of the reproduction number \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma (t)$$\end{document}σ(t), i.e., the number of secondary infections produced by a primary infection, which can be arbitrarily varied in time over a quarantine period T to account for external interventions. We also assume that the most strict quarantine (lower bound of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma $$\end{document}σ) cannot last for a period longer than a value \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document}τ. The aim is to minimize the cumulative number of ever-infected individuals (recovered) and the socioeconomic cost of interventions in the long term, by finding the optimal way to vary \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma (t)$$\end{document}σ(t). We show that the optimal solution is a single bang-bang, i.e., the strict quarantine is turned on only once, and is turned off after the maximum allowed time \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document}τ. Besides, we calculate the optimal time to begin and end the strict quarantine, which depends on T, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document}τ and the initial conditions. We provide rigorous proofs of these results and check that are in perfect agreement with numerical computations.


When should the suppression policy begin in [0, T]?
2. Is it convenient to split the maximum time τ into different intervals? 3. Is it better to apply a strong lockdown followed by mild mitigation measures or not?
In this article we study the previous questions using optimal control tools and numerical computations [24][25][26][27] . The answers clearly depend on the goal, which in our case is to minimize the overall impact of the epidemics in terms of the final number of infected individuals and the social and economic cost of the interventions, which we assume to increase as the quarantine becomes more strict. To account for quarantine measures, we consider a time-dependent reproduction number. Using an optimal control approach we show that the optimal strategy is of a single bang-bang type, that is, the lockdown or strict quarantine is applied in a single interval of time. Moreover, we characterize the time to start and finish the lockdown during the intervention phase. Let us remark that these questions make sense also in SIHR models, which include hospitalized individuals, since it must be necessary to keep the maximum of the hospitalized group below some threshold 28 .
Recently, many works have appeared dealing with these and related issues. The optimal time to start the suppression measures that maximizes this type of objective function was studied by Ketcheson 23 , where it was proved that a bang-bang control is optimal. However, in that work it is assumed that the lockdown corresponds to a zero reproduction number, something that is impossible to achieve in the real world. Moreover, it is assumed that the strict lockdown can last during the whole intervention, which seems to be impracticable. This problem was also analyzed for a different objective function 29 , i.e., minimizing the peak of infected individuals, for which they proved that the optimal policy is not bang-bang. Besides, Kruse and Strack 30 minimize a functional that depends on the number of infectives during the intervention plus a term that measures the social and economic cost of interventions, and prove that the optimal control is bang-bang, but they do not investigate the optimal time to start the suppression policy.
The second question is suggested by the strategy proposed by Ferguson et al. 21 : the lockdown must be turned on and off several times based on the incidence of the virus in the population. A control-theoretic approach was considered in several works 28,[31][32][33] , although no time limits for the interventions were imposed. We shall see that the optimal policy is of bang-bang type, which consists on turning the lockdown on only once and turning it off after the maximum allowed time τ , in agreement with other authors 23,30 .
Finally, the third question involves both suppression and mitigation phases, and one of the policies was colorfully characterized as the hammer and the dance in 34 : a strict lockdown, followed by mitigation measures in order to keep under control the propagation of the disease. However, our main results indicate that the best strategy actually depends on the initial condition, determined by the relation between R 0 and the initial fraction of susceptible individuals x 0 . On the one hand, when x 0 is smaller than 1/R 0 the optimal strategy consists on applying a strong lockdown right at the beginning of the intervention period [0, T] for the maximum time period τ , followed by a mild mitigation measure until the end of the intervention (strong-mild strategy). On the other hand, when x 0 is larger than 1/R 0 the optimal strategy is to apply mild mitigation measures at the beginning of the intervention, followed by a strong lockdown, and then a mild mitigation measure again in some cases (mild-strong-mild strategy). In this case, the optimal time to start the strong lockdown depends non-trivially on the initial condition.
The paper is organized as follows. We start by describing the basic SIR model in "The SIR model" section, and by introducing the SIR model with control in "The SIR control model" section. We then present the main results of the article supported by numerical simulations in "Results" section, followed by a discussion and conclusions including future work in "Discussion and conclusions" section. Finally, in "Methods" section we provide rigorous proofs of the results by applying Pontryagin's maximum principle to the control problem, and we prove that

The SIR model
The basic SIR compartmental model of infectious diseases introduced by Kermack and McKendrik 1 considers a population of individuals that is divided in three compartments with homogeneous characteristics: Susceptible (S), Infected (I), and Recovered or Removed (R). The fraction of susceptible, infected and recovered individuals at time t is denoted by x(t), y(t) and z(t) = 1 − x(y) − y(t) , respectively. It is assumed that each infected individual is in contact with an average number of β random individuals per unit time, and that infects only those who are susceptible ( S → I transition), generating new infections at an average rate βx(t) . Besides, each infected individual recovers at a rate γ ( I → R transition). Births and deaths are neglected, and the recovered population is assumed to no longer infect others and cannot be reinfected. The infection and recovery rates β and γ , respectively, are related to the basic reproduction number σ by σ = β/γ , that is, the mean number of infections produced by a single infected individual in a totally susceptible population ( x = 1 ) during its mean infectious period 1/γ . Then, the evolution of the system is governed by the following set of coupled nonlinear ordinary differential equations (see Hethcote 26 , section 2.1): with (x(0), y(0)) ∈ D = (x 0 , y 0 ) : x 0 > 0, y 0 > 0, x 0 + y 0 ≤ 1 . Here x ′ and y ′ are short notations for the time derivatives dx/dt and dy/dt, respectively. The region D is forward-invariant and there exists a unique solution for all time 26 . Then, since y 0 > 0 , the proportion of infectious individuals is positive at any time. Even though the temporal dynamics of Eq. (1) depends on both σ and γ , the set of system's trajectories on the x − y space depends only on the basic reproduction number σ because γ only affects the overall time scale of the system. In Fig. 1 we depict typical trajectories starting from different initial conditions The system of Eq. (1) is at equilibrium if y(t) = 0 . This equilibrium is stable only if x(t) ≤ 1/σ , a condition referred to as herd immunity. If this condition is not satisfied at the initial time ( x(0) > 1/σ ), then y(t) first increases until it reaches its maximum value at a time t for which x(t) = 1/σ (dashed vertical lines in Fig. 1), and then decreases and approaches zero asymptotically, i.e., y ∞ ≡ lim t→∞ y(t) = 0 . That is, for y(0) > 0 , γ ≥ 0 and σ ≥ 0 is y(t) larger than zero for any finite time t ≥ 0 . The fraction of susceptible individuals x(t) is strictly decreasing, and its value in the long time limit x ∞ ≡ lim t→∞ x(t) is always positive. Therefore, the state of the system in the long time limit consists only of susceptible and recovered individuals, x ∞ + z ∞ = 1 , where z ∞ ≡ lim t→∞ z(t) . Also, it is known that x ∞ ∈ (0, 1/σ ) (see Theorem 2.1 of the work by Hethcote 26 ).

The SIR control model
We now extend the classical SIR model to address the problem of controlling the spread of an epidemics with no access to vaccination, where the only possible control is isolation. We model this non-pharmaceutical intervention via a time dependent reproduction number σ (t) that can be varied in the interval [σ s , σ m ] , where σ s corresponds to a more strict isolation ("strict" quarantine) than σ m ("mild" quarantine), with 0 ≤ σ s < σ m , and assume that this intervention can only be applied over a finite time interval [0, T]. Here T is the length of the intervention period. After the intervention, the restrictions are removed, thus the disease spreads freely and σ (t) = σ f ≥ σ m for all t > T . We think of the control parameter σ (t) as capturing political measures such as 0.600, 10 -6 (b) Figure 1. Trajectories of the system in the x − y phase space for the SIR model Eq. (1) with basic reproduction number σ = 1.5 (a) and σ = 2.2 (b), where x and y are the fractions of susceptible and infected individuals, respectively. Each curve corresponds to a trajectory starting from a given initial condition (x(0), y(0)), as indicated in the legends. Arrows denote the direction of the time evolution of x and y. The vertical dashed line corresponds to the critical value x = 1/σ where y ′ = 0. www.nature.com/scientificreports/ social distancing, and the lockdown of businesses, schools, universities and other institutions. Then, the system evolves according to the following set of coupled nonlinear ordinary differential equations: with (x(0), y(0)) ∈ D = (x 0 , y 0 ) : We also assume that during the intervention period [0, T] it is not possible to impose an extremely restrictive isolation for a long time. Thus, we consider that the strict quarantine-corresponding to σ s -can last at most for a fixed time period τ , with τ ∈ (0, T) . Once the period of intervention is is the solution of the system of Eq. (2) with initial condition (x(T), y(T)) and constant reproduction number σ (t) ≡ σ f for t > T . Note that in this case, from Hethcote 26 While political measures reduce the spread of the disease, they often come at an important economic and social cost. A long and strict quarantine can be very effective at reducing contagions, but at the expense of having a negative impact on the economy. Our goal is to find the optimal control on the SIR model described above that minimizes the total damage of a pandemic in terms of both, the total number of infections and also the socioeconomic costs. We model this trade-off by considering a global cost capturing the total number of individuals that were infected during the epidemics, i.e., those who are recovered in the long-time limit z ∞ , and the socioeconomic cost of shutting down society during the intervention on [0, T], which we assume to increase as σ decreases (more restrictions). In order to find the optimal σ (t) it proves convenient to work with the fraction of susceptible individuals in the long term x ∞ instead. Then, given that minimizing z ∞ is equivalent to maximizing x ∞ , since x ∞ + z ∞ = 1 , we define the functional where Our goal is to maximize the functional J, which has the following interpretation. The first term of J is the fraction of individuals that remain susceptible in the long term x ∞ , and that we want to keep as large as possible subject to the condition of maximizing the second term of J as well, the functional C(σ ) . The functional C(σ ) is taken to be inversely proportional to the socioeconomic cost of the intervention ( C(σ ) increases as the cost decreases), as the function L is assumed to be a monotonously increasing function of σ . Then, an increase of the socioeconomic cost is achieved by decreasing σ (more restrictions or stricter quarantine), which leads to decreasing L and consequently C(σ ) . Therefore, we see that there is a non-trivial competition between the two terms of Eq. (3), given that by decreasing σ the value of x ∞ increases, while C(σ ) decreases.
In the next section we describe the main results about the optimal control and we test them via numerical simulations.

Results
As mentioned in the last section, the optimal control is given by the shape of σ (t) that minimizes both, the final number of infected individuals and the socioeconomic costs, which corresponds to maximizing the functional J from Eqs. (3) and (4). From now on we restrict ourselves to the case where the socioeconomic cost of imposing a quarantine is linear in the control σ (t) . This is a simplified first approach that narrows the analysis of the general problem formulated in "The SIR control model" section but, as we shall see, has the advantage of providing further insight into the structure of the optimal policy. The assumption that the cost of socioeconomic measures that reduce the transmission rate is linear in σ can be interpreted in the context of social distancing as given by Kruse and Strack 30 : "Shutting down half of the economy for two days is equally costly as shutting down the whole economy for a single day". Then, we consider that the function L in Eq. (4) is a linear and increasing function that depends only on the control σ , that is L(σ (t)) = κσ (t) , which satisfies the condition of being a monotonically increasing function of σ expressed in the last section. In this case, the parameter κ ≥ 0 could be interpreted as the assessment that a policy maker gives to the socioeconomic impact of the quarantine compared to the final number of infected individuals. In this regard, κ is a fixed real number that can be chosen small enough by a government that intends to reduce the final number of infected individuals regardless the socioeconomic impact, or can be chosen large enough by a government that can face a large number of final infected and intends to control the socioeconomic impact. Here we mainly focus on the case where κ is small (see condition Eq. (33) in "Methods" section). Also, when κ is large enough we prove that the optimal strategy consists in calling off the lockdown and take mild mitigation measures for all the intervention period (see Lemma 6 in "Methods" section).
Under these conditions, we prove in "Methods" section that the optimal control is of the form of a single bangbang. This consists on turning the strict quarantine on only once and switching it off after the maximum allowed time τ or, eventually, when the intervention ends at time T, depending on the initial condition and the values of the strict and mild reproduction numbers σ s and σ m , respectively. Then, the problem is reduced to find the optimal time to start the strict quarantine, which we call t * , and its length called η * . We also show in "Methods" section that the length of the strict quarantine η * for the optimal control could be less than the maximum time www.nature.com/scientificreports/ τ in some cases, as we shall see below. This means that, surprisingly, sometimes it is more convenient to make a shorter use of the strict quarantine to obtain better results in terms of pandemic costs. We analyze the optimal control in three different case scenarios: i) κ = 0 , σ m = σ f and σ s = 0 , ii) κ = 0 , σ m = σ f and σ s > 0 and iii) κ > 0 and 0 < σ s < σ m < σ f . The optimal times t * and η * for each case are given in Corollary 1, Theorems 2 and 3, respectively, of "Methods" section, where the interested reader can find rigorous proofs. The optimal control for the different cases, given by t * and η * , is summarized and numerically tested below for specific parameter values. For that, we integrate the system of Eq. (2) using the Adams' method 35,36 , for various time periods τ of the strict quarantine ( σ = σ s ) in the bang-bang control, starting at time t but ending before T, which is the control period ( σ (t) = σ f for t > T ). That is, the value of σ (t) adopts the following form, depending on t, τ and T: where We start by testing the simplest case κ = 0 , σ m = σ f and σ s ≥ 0 , and we then test the most general case κ > 0 and 0 < σ s < σ m < σ f . We take the value γ = 0.1/day for the recovery rate, which corresponds to a mean recovery time of 10 days that falls within the range of Covid-19 estimates 23 . This value of γ sets the time scales of the system. For now on all time scales are given in units of "days", even though we omit units for the sake of simplicity.
Case κ = 0 , σ m = σ f and σ s = 0 (Corollary 1). We first analyze the case κ = 0 with a bang-bang control in the interval [0, T] that consists of a mild quarantine (σ m = 1.5) and an extremely strict and unrealistic quarantine ( σ s = 0 ) during which there are no infections. The other parameter in the simulations is T = 260 , together with the initial condition y 0 = y(t = 0) = 10 −6 and x 0 = x(t = 0) = 1 − 10 −6 . We can see from Corollary 1 that the optimum initial time of the strict quarantine is t * = 0 for .
As κ = 0 , x ∞ reaches a maximum value when the strict quarantine starts at the optimal time t * [see Eq. (12)]. In Fig. 2 we plot t * vs w(0) for τ = 10 , calculated from Eq. (54) (squares) and by estimating the maximum of x ∞ (circles). We can see that t * takes values close to zero for w(0) ≤ 0 . In the rest of this section we consider the case w(0) > 0.
The behaviour of t * from Eq. (7) for x 0 > 1/σ f ( w(0) > 0 ) is tested in Fig. 3a, where we compare numerical results (circles) with that obtained from Eq. (7) (squares, Corollary 1). We observe that the agreement between numerical computations and Corollary 1 is very good. Figure 3b is an auxiliary plot that shows how to obtain

Remark 1
The effective reproductive number R σ t ≡ σ x σ (t) represents the mean number of individuals that an agent infects during its infectious period, at time t. It is interesting to note that the optimal time from Eq. (7) can be rewritten in terms of R σ t as Here where t and t are determined from the relations In Fig. 4 we show the evolution of the system in the x − y phase space for a given τ and various t (right panels), together with the evolution of σ (t) (left panels), which describe the three different behaviours of t * . All curves start at (x 0 , y 0 ) = (1 − 10 −6 , 10 −6 ) and follow the top curve with mild quarantine ( σ = σ f ) until the strict quarantine starts at t ( σ = σ s = 0 ), vertically falling down up to a lower level curve when the mild quarantine starts, and finally following this curve until the fixed point (x ∞ , 0) is asymptotically reached. The vertical trajectory describes the evolution within the strict quarantine where x(t) remains constant, given that σ (t) = σ s = 0 in that period. The optimum time t * that leads to the maximum of x ∞ corresponds to the time for which y(t) drops to the lowest level curve in the interval [t, t + η] (pink curve). For τ = 6 < 7.29 = τ ( Fig. 4 top panels) we see that the maximum of x ∞ is reached starting the strict quarantine at t * = t = 252.71 , where the effective reproduction number is R and thus there is no new outbreak when the strict quarantine is released ( dy dt | t+η = 0 ). In this case the entire quarantine period η = τ is used. For τ < τ = 12 <τ = 21.22 ( Fig. 4 middle panels) the optimum initial time is t * = T − τ = 248 < t , obtained by still using the entire strict quarantine period but starting earlier than t . Finally, for τ = 26 >τ (Fig. 4 bottom panels) the optimum is t * =t = 238.78 > T − τ = 234 , where it turns more effective to use the strict quarantine for a shorter time T − t * < τ . Notice that implementing a shorter but later strict quarantine is more efficient than using a longer and earlier strict quarantine, as we can see by comparing σ (t) for η = 26 and η = 21.22 in the bottom panels for the τ = 26 case.
Case κ = 0 , σ m = σ f and σ s > 0 (Theorem 2). We now analyze the case κ = 0 , σ m = σ f = 1.5 and σ s = 0.3 > 0 , with T = 260 . This corresponds to a strict quarantine that is softer than in the previous case σ s = 0 , and during which there are infections. Initially is y 0 = 10 −6 and x 0 = 1 − 10 −6 . We can see from Theorem 2 that the optimum initial time of the strict quarantine t * for w(0) > 0 is .
, respectively, can be seen from the definition of w(t) in Eq. (42).
In Fig. 5a we compare numerical results (circles) with results from Eq. (10) (squares, Theorem 2), where we see a very good agreement. At t * , x ∞ reaches a maximum. Unlike the σ s = 0 case, for σ s = 0.3 > 0 the optimal time t * in the 0 ≤ τ ≤ τ ≃ 8.01 interval depends on τ , that is, t * = t(τ ) , while for τ >τ ≃ 23.87 is t * =t ≃ 236.13 independent of τ . Figure 5b,c show that the optimal times t and t are estimated, respectively, as the values of t = T − τ for which the curve w(T − τ ) crosses the horizontal line 0 and the curve 1/ γ y t,T−t (T − τ ) . Figure 6 is analogous to Fig. 4 for the σ s = 0 case, and depicts the three different behaviours of t * . Curves are similar to those of σ s = 0 , where the main difference is that for σ s = 0.3 > 0 the trajectory of the system within the strict quarantine in the x − y space is described by a diagonal line (see inset of top-right panel), given that σ s is larger than zero and thus x(t) decreases in this period. At the optimum time t * , y(t) drops to the lowest level curve in the interval [t, t + η] (pink curves).
General case κ > 0 and 0 < σ s < σ m < σ f (Theorem 3). In this section we analyze the most general case κ = 10 −5 > 0 , with a mild quarantine ( σ m = 1.5 ) together with a strict quarantine ( σ s = 0.3 < σ m ) during the control interval t ∈ [0, T] , and with σ (t) = σ f = 2.2 > σ m for the case of no restrictions after the control period t > T . We take T = 320 , and the rest of the parameters are the same as those in the previous studied cases. Then, from Theorem 3 the optimum initial time t * is given by Here α(t) is given by Eq. (36), whereas the dependence and independence of w(t) on τ for t ∈ [0, T − τ ] and t ∈ [T − τ , T] , respectively, is seen in the definition of w(t) in Eq. (35).
Given that we consider here κ > 0 , J reaches a maximum at the optimum time t * (see Eq. (12)). Figure 7a shows the behaviour of t * as a function of τ , where we observe a very good agreement between numerical results (circles) and Theorem 3 (squares). We also see that t * depends slightly on τ in the 0 ≤ τ ≤ τ ≃ 9.65 interval, while t * =t ≃ 291.46 for τ >τ ≃ 28.54 . The optimal times t and t are estimated as the values of t = T − τ for which the curve w(T − τ ) crosses the horizontal line 0 and the curve α(T − τ ) , respectively (Fig. 7b,c).
In the right panels of Fig. 8 we show the system's evolution in the x − y space for three different values of τ corresponding to the different behaviour of t * . Unlike the previously studied cases where σ m = σ f (Figs. 4 and 6), here we observe that the curves (x(t), y(t)) may exhibit up to three different regimes within the control period www.nature.com/scientificreports/ T, which is due to the fact that σ jumps three times in that interval: from σ m to σ s at time t, from σ s to σ m at t + τ and from σ m to σ f at T. This can be clearly seen in the t * = 310.35 curve for τ = 5 < τ (inset of top-right panel of Fig. 8). For τ in the other two regions ( τ = 18 and 34), the strict quarantine ends at T for t * , and thus σ jumps twice and (x(t), y(y)) exhibits two regimes in [0, T] (insets of middle-right and bottom-right panels). As in the previously studied cases, y(t) drops to the lowest level curve in the interval [t, t + η] for the optimum time t * (pink curves).

Discussion and conclusions
In this paper, we have studied an optimal control problem on a SIR dynamics, with a control on the reproduction number σ (t) and a limitation in the duration of the intervention T and strict quarantine. Based on the Pontryagin's maximum principle, we have given first order necessary conditions with an overall cost of the epidemic that takes into account both the maximization of the susceptible population in the long term (equivalently, a minimization of the ever infected population) and a penalization of the lockdown associated to a social and economic cost of the epidemic. We also point out that we have employed a novel proof to establish our analytical results. Moreover, some numerical examples have been provided to show the validity of our theoretical results. Given a fixed time of intervention T where control strategies can be applied, and a strict quarantine period τ < T that represents the maximum time lapse for the stronger intervention, we proved that the optimal strategy is bang-bang when the term representing the socioeconomic cost of the objective functional is linear with respect to the control σ . More precisely, the optimal solution consists of switching at most twice between a mild ( σ = σ m ) and a strict ( σ = σ s < σ m ) quarantine, where the latter lasts at most a time period τ.
Although some studies have supported the idea that a too soon or too late intervention may not minimize the total mortality, we found a broader scenario. This is because the optimal solution takes the value σ = σ s corresponding to the lockdown on an interval [t * , t * + η] ⊆ [0, T] , with t * and η ≤ τ depending on the initial fractions of susceptible and infected individuals x 0 and y 0 , respectively, and the parameters γ , τ , σ f , σ s , σ m and T. In fact, we showed that, in some cases, the optimal strategy consists of taking t * = 0 or t * + η * = T (see Theorem 2 items 1 and 3-4 respectively). However, for an initial condition that corresponds to a real-life case scenario in which the percentage of the population that is infected is small when non-pharmaceutical interventions start, we obtained that the optimal strategy consists on delaying the beginning of the lockdown (items 2-4 from Theorem 2). For the case τ ≪ T and x 0 < 1/R 0 , this optimum consists in applying a mild mitigation policy ( σ = σ m ) at the beginning of the intervention, followed by a strong suppression policy ( σ = σ s ) and then a mild mitigation again (mild-strict-mild strategy). Here the optimum time t * to start the strict quarantine corresponds to one that leaves the effective reproduction number σ (t * + τ )x(t * + τ ) at or just below the threshold value 1.0 when the strict quarantine is released, preventing a new outbreak. For the case τ T the optimum corresponds to a mild-strict mitigation strategy, with a strict quarantine that starts late and lasts for a period shorter than τ . Surprisingly, it turns more effective to implement a short strict quarantine that starts late than a long strict quarantine that starts early.
We remark that these are optimal strategies within the basic SIR model defined in Eq. (2), which describe in an oversimplified manner the spread of an epidemic on an infinitely large population of individuals with homogeneous recovery, contagion and contact rates, where stochastic fluctuations due to finite-size effects are neglected. Then, stochastic fluctuations in the SIR model on finite populations may play a major role at the beginning of the epidemics if the fraction of infected individuals is relatively small, and thus starting with a strict quarantine www.nature.com/scientificreports/ may prove more effective if we want to drive the epidemic to extinction. However, we expect that the results presented in this article hold in the limit of very large populations. We have also studied the possibility of implementing intermittent quarantines, and the possibility of applying suppression measures first, followed by mitigation measures. In both cases, if the total duration of measures is limited, we have shown that they are not optimal in order to maximize the fraction of susceptible individuals at the end of the pandemic.
A major concern with respect to the current COVID-19 crisis is the possibility of an overload of available treatment resources. Since the hospitalized individuals are a fraction of the infected population, a natural objective is to keep the number of infected individuals below some threshold for all times. In a future work we intend to extend our analytic results including a running state constraint that takes this restriction into account. It might also be interesting to study the agent-based version of the SIR model, which naturally accounts for finite-size fluctuations, in order to investigate the role played by stochastic fluctuations in the different optimal strategies described above. It would be worthwhile to explore how the results are affected by the heterogeneity in recovery and infection parameters related to age and social stratum. Finally, we also aim to study the role of an underlying network of contacts, and changes in contact rates due to individual measures triggered by fear of contagion.

Methods
Formalization of the optimal control problem. As said in "Results" section, we assume that the function L [integrand of C(σ ) ] depends linearly on the control σ (t) . This is a simplified first approach which provides further insight into the structure of the optimal policy. In fact, under this linearity assumption we will prove that the optimal control must be bang-bang (Lemma 2), that is, the strict quarantine is turned on and off. Thus, we consider that L(σ (t)) = κσ (t) with κ ≥ 0 . In this case the functional J reads Moreover, we consider a restriction on the admissible controls σ that assumes that the control can take the value σ s for at most τ time, and also takes into account a maximum economic cost that the policy maker can afford. In regard to the latter, we consider a maximum cost for imposing the strict lock down for the entire period τ . Smaller values of σ represent stricter measures and thus a larger socioeconomical cost. Therefore, we impose an inferior bound to the average of σ on [0, T], meaning an upper bound for the socioeconomic cost. Thus, we consider the restriction Note that any control σ ∈ [σ s , σ m ] satisfying Eq. (13) takes the value σ s for a period no longer than τ . In fact, if σ is a control that takes the value σ s for a longer period of time than τ , for instance τ > τ , then we would have that T 0 σ (t)dt ≤ σ sτ + σ m (T −τ) < σ s τ + σ m (T − τ ) contradicting the inequality from Eq. (13). We are now in conditions to formalize the problem of finding the optimal control that maximizes the functional J. Given (x 0 , y 0 ) ∈ D , T, τ fixed satisfying 0 < τ < T , 0 ≤ σ s < σ m ≤ σ f , we then consider the following optimal control problem with an objective function J : In what follows we compute the partial derivatives of x ∞ (x(t), y(t), σ ) with respect to x(t) and y(t) in the same way that it is done in 23 . We begin reviewing the solution of the SIR model without control Eq. (1) as done by 23 . It can be shown 1 that x(t) satisfies x(t)e σ z(t) = x 0 e σ z 0 which combined with the identity z(t) = 1 − x(t) − y(t) implies that is constant in time for any solution of Eq. (1). The trajectories in Fig. 1 are also contour lines of µ . Since y ∞ = 0 , we then have that x ∞ = x 0 e σ (x ∞ −x 0 −y 0 ) = µ(x 0 , y 0 , σ )e σ x ∞ . Then w = −σ x ∞ satisfies the equation we w = −σ µ(x 0 , y 0 , σ ) and therefore w = W 0 (−σ µ(x 0 , y 0 , σ )) where W 0 is the principal branch of Lambert's W−function 37 , and thus for any (x, y) ∈ D µ(x(t), y(t), σ ) := x(t)e −σ (x(t)+y(t)) Scientific Reports | (2022) 12:12583 | https://doi.org/10.1038/s41598-022-16619-z www.nature.com/scientificreports/ From this expression we can compute the partial derivatives of x ∞ (x(t), y(t), σ ) with respect to x(t) and y(t) in the same way that it is done in 23 .
In order to solve problem Eq. (14) by means of the Pontryagin's Maximum Principle, we add a new state variable given by v(t) = t 0 σ (s)ds and consider σ : [0, T] → [σ s , σ m ] in the class of Lebesgue-measurable functions (so that we have an existence result for optimal solution). Thus, we can study the equivalent optimal control problem: We will refer to a 4-tuple (x, y, v, σ ) as an admissible process of the underlying control system if the control σ is a measurable function and the state (x, y, v) is an absolutely continuous vector function satisfying Eqs. (16b)-(16f). The optimal control problem consists in finding an optimal admissible process (x * , y * , v * , σ * ) that maximizes the cost J. In this case, we refer to the control σ * as optimal control.
Next, we give a result on existence of solution for the optimal control problem Eq. (16).

Proposition 1 The optimal control problem Eq. (16) admits a solution.
Proof Since L(σ ) = κσ is continuous, convex and satisfies that there exists a constant α 0 such that for all σ ∈ [σ s , σ m ] it holds L(σ ) ≥ α 0 , the proof follows directly from Theorem 23.11 24 . Using that the control space is closed, solutions of the system of Eqs. (16b), (16c) satisfy that 0 ≤ x + y ≤ 1 and the application x ∞ (x, y) is continuous, it is straightforward to prove conditions (a) to (f) from Theorem 23.11. Moreover, taking σ (t) ≡ σ m , we see that the unique solution of the system of Eqs. (16b)-(16d) together with σ gives an admissible process for which J is finite completing the hypothesis of Theorem 23.11.

Proof
Proof Assume φ(t) = 0 on an interval [a, b] ⊂ [0, T] , then computing its derivative we obtain Thus, from Eq. (18a) 1 (t) = 2 (t) = 0 for all t ∈ (a, b) and therefore for all t ∈ [0, T] , contradicting the end point conditions. Then, there cannot be singular arcs and the control σ * is given by:

Lemma 3
Let (x 0 , y 0 ) be given and (x, y, v, σ ) an admissible process. Then for t ≥ 0 and therefore Proof See 23 .
In the next lemma we will see that the switching function changes sign at most two times, concluding that an optimal control σ * jumps at most twice.

Lemma 4
The switching function φ given in Eq. (22) changes sign at most twice.
Thus, since the end time conditions are on the region of the phase diagram with 1 > 2 and 2 < 0 we have that the solution backwards in time moves to the left where 1 decreases and 2 keeps being negative. At some point in time it could cross the semiline 1 = 2 < 0 (note that there cannot be touch points). If the solution crosses this line it cannot cross the semiline 1 = 2 > 0 for a previous time and thus it stays on the region 1 < 2 for all previous times. From the definition of φ Eq. (22), for 1 < 2 , we have φ > 0 . For 1 = 2 , φ = κ + 3 (T) ≥ 0 and for 1 > 2 , φ could become negative. Since φ ′ (t) = γ x * (t)y * (t) 1 (t) , we see that for t ∈ [0, T] such that ( 1 (t), 2 (t)) is on the region 1 > 2 the function φ decreases for such t ′ s with 1 (t) < 0 and increases for 1 (t) > 0 . Also, let s 0 , s 1 ∈ [0, T] such that s 0 < s 1 , 1 (s 0 ) = 2 (s 0 ) < 0 and 2 (s 1 ) < 1 (s 1 ) = 0 , then φ(s 0 ) = κ + 3 (T) , φ reaches the minimum value on [t 0 , T] at s 1 and φ(T) = κ Thus, we conclude that φ has at most two zeros on [0, T] and the proof is finished.

Theorem 1
Let (x * , y * , v * , σ * ) be an optimal process, then: Proof Since the optimal control must be bang-bang satisfying Eq. (25), from Lemma 4 it has at most two jumps and from Eq. (13), it takes the value σ s at most for τ time. Thus the proof is completed.
As a consequence of Theorem 1 we have that if (x * , y * , v * , σ * ) is an optimal process, then the optimal control σ * is a piecewise constant function having at most two jumps and therefore its unique associated state (x * , y * , v * ) is a piecewise continuously differentiable function.
Characterization of the optimal control. In this section we will give the main Theorem of the article, that characterizes the switching times t 1 and t 1 + η (where t 1 is the beginning of the lockdown and η is its duration) for an optimal control.
Let us consider the compact set (see Fig. 9) Given (t 1 , η) ∈ R , for simplicity of notation, we will denote t 0 = 0, t 2 = t 1 + η , t 3 = T. Moreover, given (t 1 , η) ∈ R , we will denote the solution of equation Eqs. (16b), (16c) for s ≥ t by In order to do that, we need to compute the derivatives of J.
After some computations (see Eqs. (73) and (76) from Appendix) we obtain that and where x ∞,t 1 ,η = x ∞ (x t 1 ,η (T), y t 1 ,η (T), σ f ) . Note that for (t 1 , η) ∈ R , ∂J ∂η (t 1 , η) > 0 if and only if In the results given in the next sections we will assume that ∂J ∂η (t 1 , η) > 0 for all (t 1 , η) ∈ R and therefore in the following remark we analyse the derivatives of J restricted to the superior border of R (see Eq. (37) and (38)) which will be used later.

Remark 2
Assume that ∂J ∂η (t 1 , η) > 0 for all (t 1 , η) ∈ R , then the maximum value of J on R must be attained at the superior border Thus, in this case, for (t 1 , τ ) with t 1 ∈ [0, T − τ ) we have t 2 = t 1 + τ and for (t 1 , T − t 1 ) with t 1 ∈ [T − τ , T] we have t 2 = T , and the control is as in Fig. 10 Following, we define a continuous function w for t ∈ [0, T] which is the second factor in equation Eq. (31) and thus gives information on the monotonicity of J (see also Eqs. (40) and (41) below). Then we have that for (t 1 , η) ∈ P and for (t 1 , We consider J the continuous function defined as the restriction of J(t 1 , η) to P, that is From Eq. (37) we have that and from Eqs. (37) and (38), we get In the next subsection, we prove the main result of this article (Theorem 2) for the case σ m = σ f , σ s ≥ 0 and κ = 0 . Then, we derive the result for σ s = 0 (Corollary 1) in order to compare our result with the one obtained in 23 .
In this case, the sign and zeros of w(t) for t ∈ [0, T − τ ] are the same as those for the function where z(t) can be interpreted as the average on the time window [t, t + τ ] of the difference between the effective reproduction number σ f x t,τ (r) and the threshold 1.0, weighted by the inverse of y t,τ . Looking at the phase diagram of Fig. 1 we see that the trajectories travel through the contour line µ(x 0 , y 0 , σ m ) until the strict lockdown is activated, and then descends for the time the lock down lasts (always less than τ ) to another contour line of µ with σ m . For κ = 0 , the zero of w(t) ( t * ) captures the moment when this travel to a lower contour line is faster, in the sense that leaves the trajectory on the lowest possible contour line at the end of the strict lockdown and, therefore, at the maximum of x ∞ . The function z(t) can also be seen as an external parameter that becomes zero at the optimal time t * for which J reaches its maximum, i.e., z(t * ) = 0 . In the next remark we discuss the sign of z(t) for t ∈ [0, T − τ ] when σ m = σ f . Fig. 11). Then, for t ∈ [s 1 , www.nature.com/scientificreports/ and therefore x t,τ (r) < 1/σ f for r ∈ (t, t + τ ] implying z(t) < 0 . Additionally, assume there exists s 0 ∈ [0, T − τ ] such that x s 0 ,τ (s 0 + τ ) = 1/σ f (blue line). In this case, it is clear that s 0 < s 1 and also that for all t ∈ (s 0 , s 1 ) there exists a unique s t ∈ [t, t + τ ] such that x t,τ (s t ) = 1/σ f . Moreover we can conclude that for t ∈ [0, s 0 ] , x t,τ (r) ≥ 1/σ f for all r ∈ (t, t + τ ) and therefore z(t) > 0 . If s 1 defined before does not exist, that is x s,τ (s) > 1/σ f for all s ∈ [0, T − τ ] , then we take s 1 = T − τ . Likewise, if s 0 does not exist, that is for all s ∈ [0, T − τ ] , x s (s + τ ) < 1/σ f , then we take s 0 = 0 and conclude that in either case, the sign of z(t) for t ≥ 0 , is determined in the complement of [s 0 , s 1 ].
In the next lemma we prove that z is a decreasing function on the interval (s 0 , s 1 ) introduced in Remark 3. Proof Since the duration of the strict quarantine is τ fixed, for simplicity of notation in this proof we neglect the subindex τ from solutions x and y. Given t ∈ (s 0 , s 1 ) , we define the auxiliary functions By computing the derivative for s ∈ (t, t + τ ) we obtain and we have that Note that both g ′ t and i ′ t have the opposite sign of (σ f x t (s) + σ f y t (s) − 1) . First, assume σ f x t (s) + σ f y t (s) − 1 > 0 for all s ∈ (t, t + τ ) , then from Eq. (46a), g t is a decreasing function. Moreover, since σ s x t (t + τ ) − 1 < 0 and σ f x t (t + τ ) − 1 < 0 for t ∈ (s 0 , s 1 ) , we deduce that g t (s) > g t (t + τ ) > 0 . In addition, from Eq. (45c) we obtain i t is positive.
On the other side, from the fact that x + y is a decreasing function, if we assume that there exists x y Figure 11. Trajectories (x t,τ , y t,τ ) for t ∈ [s 0 , s 1 ] when σ m = σ f . www.nature.com/scientificreports/ s > s 3 , σ f x t (s) + σ f y t (s) − 1 < 0 . Therefore, g t and i t attains a global minimum on [t, t + τ ] at s 3 and thus y t (r) dr > 0 for all s ∈ [t, t + τ ] . Furhtermore, from Eq. (46b) we have that f t (s) is also a decreasing function.
From Eq. (40), we see that the monotonicity of J on [0, T − τ ] depends on the sign of w. For all the foregoing, if J attains its maximum at t 1 = t * ∈ [0, T − τ ] , we can interpret w(t) as an external parameter that shifts the extremal point of J from t * to zero.
In the next theorem we assume that x 0 < 1/σ s . This condition is always satisfied for σ s < 1 .
: t * =t where t is the unique value on [T − τ , T] such that w(t) = 1 γ y˜t ,T−t (t) and η = T −t.

Data availability
All relevant data are within the paper.

Appendix
We b e g i n b y c o m p u t i n g t h e d e r i v a t i v e s o f x t 1 ,η (T) = � 1 (T, t 1 + η, x 2 , y 2 , σ m ) a n d y t 1 ,η (T) = � 2 (T, t 1 + η, x 2 , y 2 , σ m ) with respect to t 1 . We recall two properties for the solutions of ordinary differential equations. First, the relation between the derivative with respect to initial time and the derivatives with respect to initial data give us the equation for defined in Eq. (28), with s ≥ t , σ ∈ {σ s , σ m } , initial data (x, y) ∈ D at time t and j = 1, 2.